surrogate loss
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Singapore (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
- North America > United States (0.04)
- (2 more...)
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Ohio (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Fundamental Novel Consistency Theory: $H$-Consistency Bounds
In machine learning, the loss functions optimized during training often differ from the target loss that defines task performance due to computational intractability or lack of differentiability. We present an in-depth study of the target loss estimation error relative to the surrogate loss estimation error. Our analysis leads to $H$-consistency bounds, which are guarantees accounting for the hypothesis set $H$. These bounds offer stronger guarantees than Bayes-consistency or $H$-calibration and are more informative than excess error bounds. We begin with binary classification, establishing tight distribution-dependent and -independent bounds. We provide explicit bounds for convex surrogates (including linear models and neural networks) and analyze the adversarial setting for surrogates like $ρ$-margin and sigmoid loss. Extending to multi-class classification, we present the first $H$-consistency bounds for max, sum, and constrained losses, covering both non-adversarial and adversarial scenarios. We demonstrate that in some cases, non-trivial $H$-consistency bounds are unattainable. We also investigate comp-sum losses (e.g., cross-entropy, MAE), deriving their first $H$-consistency bounds and introducing smooth adversarial variants that yield robust learning algorithms. We develop a comprehensive framework for deriving these bounds across various surrogates, introducing new characterizations for constrained and comp-sum losses. Finally, we examine the growth rates of $H$-consistency bounds, establishing a universal square-root growth rate for smooth surrogates in binary and multi-class tasks, and analyze minimizability gaps to guide surrogate selection.
- North America > Canada > Ontario > Toronto (0.13)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- (4 more...)
Theory and Algorithms for Learning with Multi-Class Abstention and Multi-Expert Deferral
Large language models (LLMs) have achieved remarkable performance but face critical challenges: hallucinations and high inference costs. Leveraging multiple experts offers a solution: deferring uncertain inputs to more capable experts improves reliability, while routing simpler queries to smaller, distilled models enhances efficiency. This motivates the problem of learning with multiple-expert deferral. This thesis presents a comprehensive study of this problem and the related problem of learning with abstention, supported by strong consistency guarantees. First, for learning with abstention (a special case of deferral), we analyze score-based and predictor-rejector formulations in multi-class classification. We introduce new families of surrogate losses and prove strong non-asymptotic, hypothesis set-specific consistency guarantees, resolving two existing open questions. We analyze both single-stage and practical two-stage settings, with experiments on CIFAR-10, CIFAR-100, and SVHN demonstrating the superior performance of our algorithms. Second, we address general multi-expert deferral in classification. We design new surrogate losses for both single-stage and two-stage scenarios and prove they benefit from strong $H$-consistency bounds. For the two-stage scenario, we show that our surrogate losses are realizable $H$-consistent for constant cost functions, leading to effective new algorithms. Finally, we introduce a novel framework for regression with deferral to address continuous label spaces. Our versatile framework accommodates multiple experts and various cost structures, supporting both single-stage and two-stage methods. It subsumes recent work on regression with abstention. We propose new surrogate losses with proven $H$-consistency and demonstrate the empirical effectiveness of the resulting algorithms.
- North America > Canada > Ontario > Toronto (0.13)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Trading off Consistency and Dimensionality of Convex Surrogates for Multiclass Classification
In this paradigm, outcomes must be embedded into the reals with dimension $d \approx n$ in order to design a consistent surrogate loss. Consistent losses are well-motivated theoretically, yet for large $n$, such as in information retrieval and structured prediction tasks, their optimization may be computationally infeasible. In practice, outcomes are typically embedded into some $\mathbb{R}^d$ for $d \ll n$, with little known about their suitability for multiclass classification. We investigate two approaches for trading off consistency and dimensionality in multiclass classification while using a convex surrogate loss. We first formalize partial consistency when the optimized surrogate has dimension $d \ll n$. We then check if partial consistency holds under a given embedding and low-noise assumption, providing insight into when to use a particular embedding into $\mathbb{R}^d$. Finally, we present a new method to construct (fully) consistent losses with $d \ll n$ out of multiple problem instances. Our practical approach leverages parallelism to sidestep lower bounds on $d$.